gSpan: Graph-Based Substructure Pattern Mining

نویسندگان

  • Xifeng Yan
  • Jiawei Han
چکیده

We investigate new approaches for frequent graph-based pattern mining in graph datasets and propose a novel algorithm called gSpan (graph-based Substructure pattern mining), which discovers frequent substructures without candidate generation. gSpan builds a new lexicographic order among graphs, and maps each graph to a unique minimum DFS code as its canonical label. Based on this lexicographic order, gSpan adopts the depth-first search strategy to mine frequent connected subgraphs efficiently. Our performance study shows that gSpan substantially outperforms previous algorithms, sometimes by an order of magnitude.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Graph-Based Substructure Pattern Mining Using CUDA Dynamic Parallelism

CUDA is an advanced massively parallel computing platform that can provide high performance computing power at much more affordable cost. In this paper, we present a parallel graph-based substructure pattern mining algorithm using CUDA Dynamic Parallelism. The key contribution is a parallel solution to traversing the DFS (Depth First Search) code tree. Furthermore, we implement a parallel frequ...

متن کامل

On the discovery of group-consistent graph substructure patterns from brain networks.

Complex networks constitute a recurring issue in the analysis of neuroimaging data. Recently, network motifs have been identified as patterns of interconnections since they appear in a significantly higher number than in randomized networks, in a given ensemble of anatomical or functional connectivity graphs. The current approach for detecting and enumerating motifs in brain networks requires a...

متن کامل

Frequent Sub-graph Mining on Edge Weighted Graphs

Frequent sub-graph mining entails two significant overheads. The first is concerned with candidate set generation. The second with isomorphism checking. These are also issues with respect to other forms of frequent pattern mining but are exacerbated in the context of frequent sub-graph mining. To reduced the search space, and address these twin overheads, a weighted approach to sub-graph mining...

متن کامل

Optimizing gSpan for Molecular Datasets

We propose two optimizations for mining molecular databases with gSpan, one of the state-of-the-art graph mining algorithms. Both optimizations apply to the enumeration of subgraph occurrences in a graph database, which is, also according to our profiling, the most expensive operation of gSpan. The first optimization reduces the number of subgraph isomorphisms that need to be accessed for prope...

متن کامل

On Canonical Forms for Frequent Graph Mining

In approaches to frequent graph mining that are based on growing subgraphs into a set of graphs, one of the core problems is how to avoid redundant search. A powerful technique to overcome this problem is a canonical description of a graph, which uniquely identifies it, and a corresponding test. This paper introduces a family of canonical forms that are based on systematic ways to construct spa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002